MBOI: Discovery of Business Opportunities on the Internet
نویسندگان
چکیده
We propose a tool for the discovery of business opportunities on the Web, more specifically to help a user find relevant call for tenders (CFT), i.e. invitations to contractors to submit a tender for their products/services. Simple keyword-based Information Retrieval do not capture the relationships in the data, which are needed to answer the complex needs of the users. We therefore augment keywords with information extracted through natural language processing and business intelligence tools. As opposed to most systems, this information is used at all stages in the back-end and interface. The benefits are twofold: first we obtain higher precision of search and classification, and second the user gains access to a deeper level of information. Two challenges are: how to discover new CFT and related documents on the Web, and how to extract information from these documents, knowing that the Web offers no guarantee on the structure and stability of those documents. A major hurdle to the discovery of new documents is the poor degree of “linkedness” between businesses, and the open topic area, which makes topic-focused Web crawling (Aggarwal et al., 2001) unapplicable. To extract information, wrappers (Soderland, 1999), i.e. tools that can recognise textual and/or structural patterns, have limited success because of the diversity and volatility of Web documents. Since we cannot assume a structure for documents, we exploit information usually contained in CFTs: contracting authority, opening/closing date, location, legal notices, conditions of submission, classification, etc. These can appear marked up with tags or as free-text. A first type of information to extract are the socalled named entities (Maynard et al., 2001), i.e. names of people, organisations, locations, time or quantities. To these standard entities we add some application-specific entities such as FAR (regulation number), product dimensions, etc. To extract named entities we use Nstein NFinderTM, which uses a combination of lexical rules and a dictionary. More details about the entities, statistics and results can be found in (Paradis and Nie, 2005a). We use another tool, Nstein NconceptTM, to extract concepts, which capture the “themes” or “relevant phrases” in a document. NConcept uses a combination of statistics and linguistic rules. As mentioned above, CFTs not only contains information about the subject of the tender, but also procedural and regulation information. We tag passages in the document as “subject” or “non-subject”, according to the presence or absence of the most discriminant bigrams. Some heuristics are also applied to use the “good predictors” such as URL and money, or to further refine the non-subject passages into “regulation”. More details can be found in (Paradis and Nie, 2005b). Another information to extract is the industry or service, according to a classification schema such as NAICS (North American Industry Classification System) or CPV (Common Procurement Vocabulary). We perform multi-schema, multi-label classification, which facilitates use across economic zones (for instance, an American user may not be familiar with CPV, a European standard) and confusion over schemas versions (NAICS version 1997/Canada vs. NAICS version 2002). Our classifier is a simple Naive Bayes, trained over 20,000 documents gathered from an American Government tendering site, FBO (Federal Business Opportunities). Since we have found classification to be sensitive to the pres-
منابع مشابه
Identification and classification of the business model elements influencing on trading strategy in startup business with a meta-synthesis approach
Nowadays, new opportunities for business development have been created on the internet, which have shaped different business concepts. In addition, the number of e-businesses has increased recently but in many cases, they have faced great challenges for development and survival. These challenges are mainly reduced by having a proper plan for the business model (BM). Despite the extensive studie...
متن کاملThe Sociological Effects of Internet on Educational Opportunity among Students
This paper tries to examine the effects of Internet on educational opportunity. Internet, as the most useful technology of modern times which helps us not only in our daily lives but also in professional lives. For educational purposes, it is widely used to gather information and to do research or add to the knowledge of various subjects. The theoretical framework considered with some theories...
متن کاملTopic Modeling and Classification of Cyberspace Papers Using Text Mining
The global cyberspace networks provide individuals with platforms to can interact, exchange ideas, share information, provide social support, conduct business, create artistic media, play games, engage in political discussions, and many more. The term cyberspace has become a conventional means to describe anything associated with the Internet and the diverse Internet culture. In fact, cyberspac...
متن کاملA Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors
Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...
متن کاملDynamic Formation of Business Networks: A Framework for 'Quality of Information'-Based Discovery of Resources
New business opportunities rarely conform to the way the industry traditionally approached the market, which is an opportunity for newcomers and flexible small and medium-sized enterprises to be the first in recognizing and taking advantage of emerging market opportunities. However, newcomers and small and medium-sized enterprises may be too young or too small to possess all the required compet...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005